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Abstract — This paper proposes a fully decentralized adaptive 
re-weighted state estimation (DARSE) scheme for power systems 
via network gossiping. The enabling technique is the proposed 
Gossip-based Gauss-Newton (GGN) algorithm, which allows to 
harness the computation capability of each area (i.e. a database 
server that accrues data from local sensors) to collaboratively 
solve for an accurate global state. The DARSE scheme mitigates 
the influence of bad data by updating their error variances 
online and re-weighting their contributions adaptively for state 
estimation. Thus, the global state can be estimated and tracked 
robustly using near-neighbor communications in each area. 
Compared to other distributed state estimation techniques, our 
communication model is flexible with respect to reconfigurations 
and resilient to random failures as long as the communication 
network is connected. Furthermore, we prove that the Jacobian 
of the power flow equations satisfies the Lipschitz condition that 
is essential for the GGN algorithm to converge to the desired 
solution. Simulations of the IEEE-118 system show that the 
DARSE scheme can estimate and track online the global power 
system state accurately, and degrades gracefully when there are 
random failures and bad data. 

Index Terms — hybrid state estimation, convergence, gossiping 

I. Introduction 

Traditionally, power system state estimation (PSSE) has 
been solved by the iterative Gauss-Newton (GN) procedure 
(or its variants) using measurements from Supervisory Control 
and Data Acquisition (SCAD A) systems |3|. Recently, Phasor 
Measurement Units (PMU) in the Wide-Area Measurement 
Systems (WAMS) are gaining increasing attention since state 
estimation using PMU data becomes a linear least squares 
problem However, due to the limited deployment of 
PMUs, researchers have proposed hybrid estimation schemes 
t)y integrating WAMS and SCADA measurements. 
Some of these methods incorporate the PMU measurements 
into the GN update |7J-|0|, while others use PMU data to 
further refine the estimates from SCADA data (TOl, (TT). 

A. Related Work 

Driven by the ongoing expansion of the cyber infras- 
tructure of power systems and fast sampling rates of PMU 
devices, there has been growing concern on distributing the 
computations across different areas by fusing information in 
various ways ||4|, |T2|-pO|. In most of these methods, each 
distributed area solves for a local state and refines the local 
estimates in a hierarchical manner by leveraging on the tie-line 
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Structure and tuning the estimates on boundary buses shared 
with neighboring areas. Although these methods considerably 
alleviate the computational burdens at control centers, they rely 
on aggregation trees that require centralized coordination and 
depend on the power grid topology. This is clearly a limitation 
in reconfiguring the system, if random failures or attacks call 
for it. Last but not least, this class of methods typically requires 
local observability provided by redundant local measurements, 
which may not be satisfied during contingencies. 

Recently, the authors of | [2T| , p2| proposed distributed state 
estimation schemes for power systems that do not require local 
observability. Specifically, (22) follows a similar formulation 
as in fTSl-fTTl and uses the alternating direction method 
of multipliers (AD-MoM) to distribute the state estimation 
procedure. The merit of AD-MoM in p2) lies in the fact that 
the computation is local and scalable with respect to the num- 
ber of variables in each area. However, the communications 
required by the AD-MoM scheme are constrained by the grid 
topology. Also, the numerical tests performed in [22 1 are based 
exclusively on PMU data and the algorithm convergence in the 
presence of SCADA measurements is not discussed in general. 
While the advantages of hybrid state estimation schemes are 
evident from fSj, 16], these papers do not provide analytical 
proof nor performance guarantee for the convergence. 

In order to obtain the global state, the approach taken in 
|2T| is inspired by a number of recent advances in network 
diffusion algorithms for optimization. Diffusion algorithms are 
capable of solving an array of problems in a fully decentralized 
manner without any hierarchical aggregation, including linear 
filtering | [23| , convex optimizations |24| and adaptive estima- 
tion f25l . These techniques combine a local descent step with a 
diffusion step, which is performed via network gossiping. The 
convergence of these algorithms relies on the convexity of the 
cost function and a small (or diminishing) step-size, which 
slows down the algorithm in general. Furthermore, PSSE with 
SCADA measurements is non-convex and it is not clear how 
these methods will perform in practice. 

Compared to other decentralized methods including |T|, 
1 2 1, another major issue addressed in this paper is bad data 
processing. There has been extensive work devoted to the 
detection and identification of bad data in power systems, 
mainly divided into two categories. The first category is 
usually handled by x^-test and the largest normalized residual 
(LNR) test (31 , (26|-(29|. There are also specific detection 
schemes using PMU measurements in different ways [ 30| , 
|31 1. To reduce the bad data effects, LNR tests are performed 
successively, which re-estimates the state after removing or 
compensating the data entry with the largest residual (T7| , 
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p2| . Based on the LNR principle, p3| developed a distributed 
scheme where different areas exchange measurement residuals 
and successively re-estimate the state, until no further alarms 
are triggered. To avoid repetitive computations, p4| , p5| 
suggested computing ^"^^ LNR test statistics as a rule-of- 
thumb to identify bad data in one pass, where the state only 
needs to be re-estimated one additional time. Furthermore, the 
work | [22| , |[36| proposed a convex optimization approach to 
directly estimate the bad data jointly with the state variables by 
integrating a sparsity constraint on the number of error outliers. 
The second approach for bad data processing, on the other 
hand, is to suppress their effects on the state estimates instead 
of removal. For instance, [ [37| , jSSj propose incorporating 
different weights for the residuals to limit the impacts of bad 
data on the state updates. A more comprehensive literature 
review on bad data processing can be found in p9| , | [4Q| . 

B. Contributions 

In this paper, we formulate the state estimation problem in 
a Maximum Likelihood (ML) framework, and develop the De- 
centralized Adaptive Re-weighted State Estimation (DARSE) 
scheme specifically for wide-area PSSE. The DARSE scheme 
deals with bad data similarly to [[37|, p8| , where the bad data 
variances are adaptively updated based on the measurements. 
Furthermore, the DARSE scheme generalizes the Gossip-based 
Gauss-Newton (GGN) algorithm we proposed in [41 1 under an 
adaptive setting, which exhibits faster convergence than the 
distributed state estimation scheme in | [2T| derived from first 
order diffusion algorithms. Another important contribution of 
this paper is that we prove sufficient conditions for the con- 
vergence of GGN algorithm, by showing that the Jacobian of 
power flow equations satisfies strictly the Lipschitz condition. 
Furthermore, thanks to the adaptive features of the DARSE 
scheme, it automatically adjusts the weights for different 
sensor observations based the measurement quality, to reduce 
the impacts of bad data on the state estimates. The main benefit 
of the DARSE scheme is that it is completely adaptable to both 
time- varying measurements quality and network conditions. 
The quality of measurements can degrade in a random fashion 
and also the communication network can experience failures. 
With mild connectivity conditions, DARSE is able to deliver 
accurate estimates of the global state at each distributed area. 
Our claims are verified numerically on the IEEE-118 system. 

C. Notations 

In this paper, we used the following notations: 

• i: imaginary unit and R and C: real and complex numbers. 

• and ^{-j: the real and imaginary part of a number. 

• Iat: an X identity matrix. 

• 1 at: an A/" X 1 vector with all entries equal 1. 

• II A II and || A||i? are the 2-norn:j^and F-norm of a matrix. 

• vec(A) is the vectorization of a matrix A. 

• A^, Tr(A), Amin(A) and Amax(A): transpose, trace, 
minimum and maximum eigenvalues of matrix A. 

• (g) is the Kronecker product and E[-] means expectation. 

^The 2-norm of a matrix is the maximum of the absolut e value of the 
eigenvalues and the 2-norm of a vector x G is ||x|| = y^n^i ^'ri- 



ll. Power System State Estimation 

The power grid is characterized by buses that represents 
interconnections, generators or loads, denoted by the set 
A/" = {1, • • • , N}. The grid topology is determined by the edge 
set £ = {{n, m}} with cardinality \£\ = E, which correponds 
to the transmission line between bus n and m. The Energy 
Management Systems (EMS) collect measurements on certain 
buses and transmission lines to estimate the state of the power 
system, i.e., the voltage phasor Vn ^ C at each bus n G A/". In 
this paper, we consider the Cartesian coordinate representation 
using the real and imaginary components of the complex volt- 
age phasors V = [3?{Vi}, • • • , ^{Vn}, ^{Vi}, • • • , ^{Vn}]^ . 
This representation facilitates our derivations because it ex- 
presses PMU measurements as a linear mapping and SCADA 
measurements as quadratic forms of the state v as in | [42| . 

A. Measurement Model 

Since there are 2 complex injection measurements at each 
bus and 4 complex flow measurements on each line, this 
amounts to twice as many real variables. Thus the measure- 
ment ensemble has M = AN + SE entries in an aggregate 
vector partitioned into four section^ 

z[t] = [z^[t],z^[t],z^[t],z^[t]f, (1) 

containing the length-2A^ voltage phasor zv[t] and power 
injection vector zx[t] at bus n e JV, the length-4£^ current 
phasor zc[t] and power flow vector zjr[t] on line {n^m) G £ 
at bus n. These measurements are gathered at a certain 
periodicity T^e for state estimation. In contrast to the slow 
rate SCADA measurements, since PMU devices also provide 
fast samples for dynamic monitoring and control, some pre- 
processing is needed to align measurements that come at 
widely different rates. This is an important practical issue 
prior to the state estimation, however it is unrelated with the 
estimation methodology considered here and therefore is left 
for future investigation. State estimation is performed using 
SCADA measurements {zx[t], zjr[t]} and PMU measurements 
{zv[t], zc[t]} that have been pre-processed and aligned. 

Defining the power flow equations f(.)(v) in Appendix [a| 
and letting v[t] be the true state at time t, the individual vector 
= contains observations corrupted by 

measurement noise T(.)[t] that arises from instrumentation im- 
precision and random outliers whose variances are potentially 
much larger due to attacks or equipment malfunction. The 
entries that have large variances are what we call bad data. 
Then we have the measurement model below 

z[t] =f(v[t]) + r[t], (2) 

where 

r[t] = [rUtU^[tUm,rmf (3) 

f(v) = [f^(v),fj(v),fj(v),fj(v)]^. (4) 

A practical data collection architecture in power systems 
(compatible with WAMS and SCADA) consists of / inter- 
connected areas, where each area records a subset of z in 

^Subscripts {V,C,X, J^} mean voltage, current, injection and flow. 
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([2]). To describe the measurement selection in each area, we 
define a binary mask T^^(.) G {0, with Mi rows 
and M columns, where each row is a canonical basis vector 
= [0, • • • A-,' ' ' 7 0]^ picking the corresponding element 
in each category from {V,C,X, J^}. Letting 

be the selection matrix in the i-\h area and applying this mask 
on z, the measurements in the i-th area is selected as 



c,[t] =f,(v[t]) + r,[t], 



(5) 



where c,[t\ ^ T,z[t\ = [<vM. <cM. and 
similarly = T^f (•), r^t] = TiT[t]. We assume that r4t]'s 
are Gaussian and uncorrelated between different areas, which 
has an unknown covariance denoted by 



Kilt] = diag[ei,i[t], • • • ,ei,Mj^]], 



(6) 



where Mi = Miy + Mi^c + + is the total number 
of observations from each type of measurements. 

B. Maximum Likelihood (ML) Estimation 

Using the measurement model in ([5]), the ML estimate of the 
state is obtained by maximizing the likelihood function with 
unknown noise covariances {Ri[t]}[^^ over the state space V 

{v[t], R,[t]} = arg^max logP ({c,M}[^, | v, {R^Mj^^^) , 

which is equivalent to minimizing the following cost function 

min ^||c4t]-fi(v)||^-.+^log|Ri|. (7) 

Note that due to the quadratic nature of power flow equations 
in Appendix |A| the cost function ([7]) is highly non-convex in 
V and there exist multiple stationary points v"^ in the set V"^ 
that result in local minima of ([7]). According to |43|, these 
stationary points can be determined by setting the derivatives 
with respect to v and {ei,rn}^ii ..'.^m ^^^^ 



^ Ff K)R* [c,[t] - f,K)] =0, V* e V* (8) 

i=l 

Clearly, the ML estimate v[t] achieve the global minimum of 
^ and belongs to the set V^. Solving for the ML estimate v[t] 
would require substituting (with the unknown v"^) back 
into R^ in ^ and searching for the point v"^ that achieves 
the global minimum of (|7]). Therefore, the joint ML estimates 
are obtained by equivalently solving the coupled equations 

/ 

vM = argmin^ llcM - f,(v[t]) |||-.[,] (10) 



R^t] =diag[ 



.(vM)l^ 
At].'-'] 



(11) 
(12) 



where we have used the fact that given the ML estimate of 
the noise variance ei,m[^], the stationary points of the non- 
linear least squares (NLLS) problem in ([TO]) are identical to 



that in ([8]). This equivalence can be regarded as the non-linear 
version of the linear estimation with nuisance parameters in 
|43|. However, this approach is still highly non-linear and 
complex. In the following, we take advantage of the streaming 
measurements to switch adaptively between estimating the 
state ([T0| and estimating the variances ([TT]). 



C. Adaptive Re-weighted State Estimation 

If the noise covariance is known, the state ([T0| can be 
obtained directly from conventional PSSE |3|. Therefore, we 
progose to use the previous covariance estimate as a substitut^ 
of ^i[t\, i = 1^ ^ I io re- weight the measurements in the 
current snapshot, and propose the Adaptive Re-weighted State 
Estimation (ARSE) schem^in Algorithm [l] 

Algorithm 1 ARSE Scheme 

1: Predict outlier covariance = R^^ — 1]. ^ = Ir ' • ? ^ 
2: Update state estimates 

w[t] = argmin^ [cM - U{^)f T^^ [ci[t] - fi(v)] 
3: Adjust covariance Ri[t] = diag[?^^i[t], • • • ^Ci^MAA] 

?i,.nM = |Q,.nM-/^,m(vM)p. (13) 



Our objective is to harness the computation capabilities in 
each area to perform state estimation ([T0| and ([TT]) online in 
a decentralized fashion, as shown on the right in Fig. [T] Note 
that each area estimates the global state v, rather than the 
portion that pertains to its local facilities. Since step (1) and 
step (3) in Algorithm [T] are decoupled between different areas, 
their decentralized implementations are straightforward. Now 
we omit the time index t and focus on solving step (2) 



V = arg mm 



El 



f.(v) 



(14) 



where fi(v) = F- ^fi(v) and = F- ^c^. Note that we 
propose to solve ([14]) in a decentralized setting, where each 
area has a local estimate that is in consensus with other 
areas i' ^ i and converges to the global estimate v. 

Traditionally, state estimation solvers attempt to find the 
global minimum v numerically by Gauss-Newton (GN) algo- 
rithm iterations, whose updates are given by 

vf+i = Pv [vf + ] , = Q-i(vf )q(vf ), (15) 

where Pv(-) is a projection on the space V, q(vj^) and Q(v^) 



^In general, a better substitute can be predicted using the temporal statistics 
of the random process r[t], but here we simply use the previous estimate. 

^If desired, one can iterate once again the state estimation after the outlier 
covariance has been updated to give a better state. 
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Hierarchical Aggregation 



Decentralized Architecture via Gossiping 



Algorithm 2 DARSE Scheme 



Power Injection (P„,Q„) / ^\ Voltage Phasor (J{{y„}, 9{y„}) 
Power Flow (P„^, ^ Current Phasor J„^) 

SCADA X \ V WAMS ci=fi(v) + ri 



# \A 



C2 = f2(v) +r2 





Fig. 1 . Centralized architecture v.s. the decentralized architecture for DARSE. 
are scaled gradients and GN Hessian of the cost function 



q(vf ) = J E - fpK')) (16) 

p=l 

Q(v') = yEFj(vf)F,(vf), 



with Fi{v) = r. ^Tidf(v)/dv^ = T- 'TiF(v) computed 
from F(v) in Appendix |a| However, each area knows only 
its own iterate vf and partial measurements and power 
flow equations f^(-) in ([16]), which makes it impossible to 
implement step (2) in a decentralized setting. 



in. Decentralized State Estimation and Tracking 



As discussed in Section [Il-Cj it is straightforward to decen- 
tralize the computations for step (1) and step (3) in Algorithm 
[T] The key enabling technique we propose for step (2), is 
the Gossip-based Gauss-Newton (GGN) algorithm that we 
proposed in (4T\. Next we describe the GGN algorithm to 
make the paper self-contained. We interchangeably use area 
and agent to refer to the entity communicating and performing 
the computation. There are two time scales in the GGN 
algorithm, one is the time for GN update denoted by the 
discrete time index "/c" and the other is the gossip exchange 
between every two GN updates denoted by another discrete 
time index All the agents have a clock that runs syn- 
chronously and determines the instants t = for the /c-th GN 
update across the network. During the interval t G [r/e,r/e+i), 
the agents communicate and exchange information with each 
other at random times Tk^i G [r/c,r/c+i) over £ = 1, • • • 
interactions. In the following, we describe the local update 



model at each agent in Section III-A and introduce in Section 



III-B the gossiping model between every two updates. 



A. Local Update Model 

The idea behind our GGN algorithm is to take advantage of 
the structure of the centralized update ([16]) as a sum of local 
terms and obtain an approximation of the sum by computing 
the scaled average via gossiping, which is interlaced with 
the optimization iterations. Therefore, we use the "network 
average" of different areas as surrogates of q(v^) and Q(v^), 



1: Predict outlier co variance at each agent = R^^ — !]• 
2: All agents run the GGN Algorithm via network gossiping. 



obtain initial variables at all agents i G X. 

set A: = 0. 

repeat 

set /c = /c + 1 . 

initialization: Obtain Hk,i{0) in ([19]) at each agent i. 
gossiping: Each agent i exchanges with neighbors 
under URE protocol for 1 < £ < according to ([29]). 
local update: Each agent i updates according to ( [23] ). 
until k = K or - < e and set v,- = vf. 



Adjust CO variance Ri[t] 



^[t]-km{m?■ 



which can be obtained via gossiping 



h, = i^Ff(vf) (c,-f,(vf: 



^^0 = jEFf(vf)F,(vf). 

i=l 



Define local vector at the i-th agent for the ^-th gossip 

evolving from initial conditions 

hfc,,(0)^Fr(vf)(c,-f,(vf)) 



hfe,i(£) 
vec[Hfc,i(£)] 



Hfc,,(0)4Ff(vf)F,(vf). 



(17) 
(18) 

(19) 

(20) 
(21) 



Clearly, the surrogates are the averages of the initial conditions 
hfc = ^ hk,m/I, Hfc = ^ Hfc,,(0)//. (22) 

i=l i=l 

To compute this average in the network, all agents exchange 
their information 'Hk,i{t) 'Hk,i{^ + 'i-) using the communi- 
cation model, more precisely the gossip exchange equation 
( |29l l, as described below in Section | III-B | Then after £k 
exchanges, the local GGN descent at the {k + l)-th update 
for the i-th agent is 

vf+i=Pv[vf +d?(4)], 
df (4) = H^ -(4)11^(4). 



(23) 
(24) 



Finally, the Decentralized Adaptive Re-weighted State Esti- 
mation (DARSE) scheme is described in Algorithm [2] 

B. Uncoordinated Random Exchange (URE) Protocol 

In this section, we introduce the Uncoordinated Random 
Exchange (URE) protocol, a popular gossip algorithm studied 
in literature | [24| , | [44| , pSJ . For each exchange in the URE 
protocol, an agent i wakes up and chooses a neighbor agent 
j to communicate during [r/c,r/c+i). The communication net- 
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work is thus a time-varying graph Qk/ = {T^Mk/) during 
[^k/iTk.i+i) for every GN update k and gossip exchange i. 
The node set X = {1, • • • , /} contains each agent in different 
area, and the edge set j} G Al/c,^ is formed by the commu- 
nication Hnks for that particular gossip exchange, which can 
be haracterized by the adjacency matrix = [A-^'^^^ 



^•^ 1 0, otherwise 



\IxI 



(25) 



Condition 1. The composite graph Qk = {^i\J^=iM.k,i'} 
for the k-th update is connected for all £ > and there exists 
an integer L > 1 such that for every agent pair {i, j} in the 
composite graph, we have for any ^ > 



(26) 



The above assumption states that all agent pairs {i^j} that 
communicate directly infinitely many times constitute a con- 
nected network Q^, and furthermore, there exists an active link 
between any agent pair {i^j} G Qk every L consecutive time 
slots [r/e,£,r/e,^+L-i] ^ (r/c,r/e+i) for any £. 

The gossip exchanges are pairwise and local | [46| , where 
agent i combines the information from agent j with a certain 
weight (3. Define a weight matrix Wk{£) = [W^Ji)]ixi as 



13, 
(1 
0, 



j j^ijj e Mk,e 
j = i 
otherwise 



(27) 



representing the weight associated to the edge j}. There- 
fore, the weight matrix {£) has the same sparsity pattern as 
the communication network graph A/e(^), and it is determined 
by the agent connectivity. Suppose agent Ik^i wakes up at 
^k,i G [^ki^k+i) and Jk^i is the node picked by node Ik^i 
with probability 'yik,i,Jk,i- The weight matrix is then 



Wk{i) =1-/3 (e/, ^ + ej, ,) (e/, , + ej, , 



(28) 



Finally, each agent i 
with neighbors as 



1 , • • • , / mixes its local information 



(29) 



Remark: Note that the typical URE protocol is random and 
asynchronous, which may not satisfy Condition [T] due to link 
formations and failures. The simulations in |4| show that the 
overall delay from substations in a IEEE- 14 bus system to 
the control center is around 2ms with bandwidth 100-1000 
Mbits/s. Thus we bound the worst case hop delay by discount- 
ing it with the network diameter 2/7 ^ 0.6ms. We assume that 
the state estimation here is performed every 10 seconds rather 
than today's periodicity (minutes) If information is stored 
with 64-bits per entry, the data packets sent by each agent per 
exchange has 64 (2 A/" + 4A/'^)-bits. For a power system with 
= 118 buses with a communication bandwidth 100 Mbits/s, 
the maximum exchange that can be accomodated in 10 seconds 
is 10xlOV(0.6x 10-^x10^ + 64(2x118+4x1182)) ^ 300, 
which is sufficiently large to avoid violating Condition [T] 
Much fewer exchanges were used in our simulations, but the 



algorithm still converges with good performance. 

IV. Convergence and Performance Guarantees 
Condition 2. First, we impose the following conditions: 

(1) The state space V is closed and convex. 

(2) The maximum and minimum costs are bounded and finite 



I 

max ^ \\ci - fi(v)||j^-i < OQ (30) 



< oo. 



(31) 



(3) The maximum and minimum eigenvalues of the GN Hes- 
sian are non-zero and finite 



CTmin = mm 

vGV 



. A^in I^^Ff (v)F,(v)j > (32) 



CTmax = min . Amax (v)Fi(v) < OO. (33) 

^^^^ \tl J 

Condition 2-(l) is easily satisfied by setting standard voltage 
limits and conditions 2-(2) & 2-(3) can be guaranteed if the 
power system is observable | [47| and the measurement noise 
is finite. Next, we prove that the Jacobians {F^(v)}[^^ satisfy 
the Lipschitz condition, which is important for the convergence 
of GGN algorithm in the DARSE scheme. 

Lemma 1. The Jacobian matrix F^(v) satisfies the Lipschitz 
condition for all i and arbitrary v, G V 



F.(v)-F,(v') 



< W V ■ 



Vi = l,---,/ (34) 



where uj is the Lipschitz constant given in ( |6Q| ). 

Proof: See Appendix |B] ■ 

Corollary 1. Given Condition |2] Lemma |7] and ^ pS] Theorem 
12.4], the following functions satisfy the Lipschitz conditions 
for all^y G V 

Ff(v)(c,-f,(v))-Ff(vO(c,-f,(vO) 
||Ff(v)F,(v)-Ff(vOF,(vO 



< ^5 ||V - 

< ||V ■ 



where vs cj(emax + CTmax) and ua ^cr^ 



A. Convergence Analysis 

The GGN algorithm is initialized with at each agent and 
continues until a stopping criterion is met. Since ([14]) is NLLS 
problem that is non-convex similar to ([7]), the iterate may 
stop at any fixed point v"^ of ([15]) satisfying the first order 
condition similar to dSl) 



1 

^Ff(v*) (c,-f,(v*))=0. 



(35) 



Clearly, the ML estimate v in ( [T4| ) is one of the fixed points 
and it is desirable to have the algorithm converge to this point. 
After we have proven Lemma [T] and Corollary [T] for power 
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systems, the analysis presented in our paper | [4T| is particularly 
useful here. In the following, we impose a condition on the 
gossip exchange according to our analysis in | [4T| . 



Condition 3. [41] We assume that {^/e}^o ^^^^^j 



/IL 



< oo. 



Given r] = min{/3, (1 — /3)} and < <^ 1, the minimum 
total exchange £^ — min/e {^/c} satisfies 

IL\og{n/C) 



L > 



log (1-7?^^)' 

where C = Ci(72(l + DC2) is a scaling constant given by 

(l + r,-^^) 



(36) 



Co = iy 



The resultant theorem from pT| can be re-stated as follows. 



Theorem 1. 4^ Lemma 1, 2, 3 & Theorem 1] Based 
on Lemma |7] and Condition [7] |2] |j] ^/z^^ ^/z^ discrepancy 
between the local update ( [23] ) and the exact update ( p3] ) 
bounded for all i and k 

||cl,^(4)-df|| </t (37) 

a^J the error between generated by ( [23] ) ^/z^ ML 
estimate v defined in ( [35] ) satisfies 

.ii2 



(38) 



where Ti = uj/2amin cind T2 = V^cjemin/Cmm- Assuming 

(i.e., T2 <C 1), and that n < 



that V^CJemin < CTmi 

(1 — /4Ti, then for any satisfying 

II ^11 ^ 2<Jjxiin 
V7 — V < 



(39) 



^/z^ asymptotic estimation error can be bounded as 

limsup ||vf+^ - v|| < (40) 

Theorem [1] implies that all the agents will converge to an 
arbitrarily small neighborhood of the ML estimate v if each 
area is initialized with a value that is sufficiently close to the 
ML estimate. Our simulations show that one can converge 
to the ML estimate from quite inaccurate initial points, even 
violating ( [39] ). Also, Condition [3] is imposed on the number of 
exchanges ik to control the convergence rate (i.e., determined 
by Ti and T2) and lower the numerical error This condition 
is influenced by many factors, such as the number of agents. 
More specifically, the number of measurements, measurement 
type and measurement location can in fact greatly affect the 
parameters {cmin, Cmax, cTmin, cTmax} in Condition |2] which in 
turn influences Ti, T2 and £^ that determine the convergence 
rate of the algorithm. 



simple choice is to = ii, and ik = ^k-i + 



B. Performance Guarantee by PMU Initialization 

Theorem [1] suggests that if the GGN iterate is initialized 
in a certain neighborhood of v, it converges to v with an 
error k, resulting from gossiping. This means that a good 
initialization v around the the ML estimate v is important. 
In fact, an effective initializer is the re- scaled average of the 
voltage measurements y of all areas because it measures 
the state directly. Next we propose a heuristic initialization 
scheme, which is shown to converge numerically. 

1) Centralized PMU Initialization v^.- As new measure- 
ments become available, a reasonable approach to initialize 
the state estimates is to use a combination of the previous 
state estimate for the buses where there are no PMU installed 
and the direct state measurements given by the PMU when 
available. The mathematical expression for this choice stated 
below helps describing and motivating the decentralized ini- 
tialization scheme that follows. 

The PMU data vector zy[t] in ([2]), records a portion of the 
state. By permuting the entries of Ci^v[^] with the matrix Ti^, 
we can relate ([2]) and the decentralized model ([5]) as follows 



(41) 



where, by virtue of of T^^y structure, the matrix I^^y = 
TfyTiy is a masked identity matrix with non-zero entries on 
locations that are PMU-instrumented. Letting ly = h,v^ 
we have Ivzv[t] = Yli=i Tfv^*,vM. Therefore, the central- 
ized initializer that merges PMUs measurements and outdated 
estimates can be written as 



vO = IvzvM + (I-Iv)svM, 



(42) 



where Sy [t] can be chosen arbitrarily (e.g. a stale estimate). It 
is of great interest to investigate the placement of PMU devices 
so that certain metrics are optimized, such as observability 
| [47| , state estimation accuracy | [49| , | [50| and mutual informa- 
tion | [5T| . This issue is not pursued here and the numerical 
results are based on an arbitrary choice. 

2) Decentralized Initializer via Gossiping: The "exact 
central initializer" in ([421 can be re- written as 



I-Y,TfyTiy\sv[t]. (43) 

V i=l / 



i=l 



However, '^Iv^hvi'^] aggregated PMU measure- 

ments, while the i-th agent can only access its local measure- 
ments TfyC^ vM - Since Xl[=i ^^fv^^^vM is written as a sum, 
it can be obtained via gossiping with the initial state 



Vi(0) = T^vCi,vM, 



(44) 



where the agents proceed the exchange with the URE protocol 
Vi{i) ^ Vi{i -\- 1). Finally, the decentralized initializer is 



v? = /V.(^) + (I-Iv)svM, 



(45) 



where ly = sgn[Vi(^)] and sgn[-] is the sign function 
sgn[v] = 1 for V 7^ and otherwise. If i is sufficiently 
large, v-^ converges to the centralized initializer in ( [42] ). 
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V. Numerical Results 

In this section, we illustrate the Mean Square Error (MSE) 
performance of the DARSE scheme. Given the distributed 
estimate in each area {V^^}^^i at each GN update, the MSE 
with respect to the voltage magnitude and voltage phase at the 
i-th site is 



MSE 



VA 



N 



N 



\v. 

I z,n 



)^MSE^^^. 



zy. 



In particular, the metric used in our comparisons are the 
cost in ([14]), Valfc = ^Li ll^iM - fi(vf evaluated 
using the decentralized estimates at each update, and the global 



.y^^ and MSE^^^ =ELi MSE, 



MSE^^^ = ELi MSe[^^ 

In the simulations we used MATPOWER 4.0 test case 
IEEE-118 (TV = 118) system. We take the load profile from 
the UK National Grid load curve from |52| and scale the 
base load from MATPOWER on load buses. Then we run 
the Optimal Power Flow (OPE) program to determine the 
generation dispatch over this period. This gives us the true 
state v[t] and all the power quantities f(v[t]) over this time 
horizon, which are all expressed in per unit (p.u.) values. 
The sensor observations are generated by adding independent 
errors ri^rn[t] ^ A/'(0,cr^) with a = 10"^. We divide the 
system into / = 10 areas where 9 areas has 12 buses in each 
area and 1 area has 10 buses, all chosen at random from 1 
to 118. Eor each area, we randomly choose 50% of all the 
available measurements and particularly exploit the 36 PMU 
measurements in Area 1, 2 and 3. The optimization of PMU 
selection is beyond the scope of this paper and hence not 
pursued here. In the first snapshot where there is no previous 
state estimate, we choose the flat profile sv[t] = [1^ Oat]^- 

A. Comparison with Diffusion Algorithms without Bad Data 

In this subsection, we evaluate the overall performance 
of the DARSE scheme against existing network diffusion 
algorithms pT| and its extension to adaptive processing in 
(251 over 3 snapshots of measurements. However, the com- 
munication protocol in j2T|, p5| requires agents to exchange 
information synchronously, and thus the URE protocol men- 
tioned in Section Ull-B I does not fit the context. To make a fair 
comparison in terms of communication costs and accuracy, 
we modify the URE protocol for this particular comparison to 
a synchronous deterministic exchange protocol, where com- 
munication links exist between every two agents {i^j} G M 
for Vz, j G X, giving an adjacency matrix A = IjlJ — I. 
The weight matrix is doubly stochastic in both cases con- 
structed according to the Laplacian L = diag(Al/) — A as 
W = 1/ — wL with w = a/ max(Al/) and a = 0.03. For 
simplicity, we also do not simulate measurements that have 
bad data in this particular comparison. The step-sizes for the 
approach in (2l| , pst are chosen as adiff = 10~^, 10~^, 0.05. 

The diffusion algorithm proceeds at each exchange i, while 
the DARSE runs £k = = = 3 exchanges for each 
update. Thus, the comparison is made on the same time scale 
by counting the number of gossip exchanges. Because we use 
ii, = 3 gossip exchanges between every two descent updates 
k = I,-- - ,20, we have a total number of 60 exchanges 




DARSE 

Distributed State Estimation via Diffusion Algorithm [21][25] a=0.001 
Distributed State Estimation via Diffusion Algorithm [21][25] a=0.005 
Distributed State Estimation via Diffusion Algorithm [21][25] a=0.01 



20 40 60 80 100 120 140 160 180 
Gossip Exchange Index 

(a) Valfc 




60 80 100 120 
Gossip Excliange Index 

(b) MSE^^ 




Fig. 2. 
using 



60 80 100 120 
Gossip Exchange Index 

(c) MSE^^^ 

Comparison between DARSE and diffusion algorithms in (2T| , (25) 

= 3 exchanges for every update. 



per snapshot. It can be seen from Fig. 2(a) to 2(c) that the 
DARSE scheme converges to the ML estimate after k = 15 
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40 60 80 

Gauss-Newton Update k 



(a) Valfe 




40 60 80 

Gauss-Newton Update k 

(b) MSE^^ 




40 60 
Gauss-Newton Update k 

(c) MSE^^^ 

Fig. 3. Performance of DARSE against centralized GN with and without 
bad data in terms of Valfc, MSE^^ and MSE^^^ 



updates (i.e., 45 message exchanges) for every snapshot and it 
tracks the state estimate accurately when new measurements 
stream in. The spikes observed in the plots are caused by the 
new measurements. Since the number of gossip exchanges is 
limited, the diffusion algorithm in |[2T| and p5| suffers from 



slow convergence and fails to track the state accurately. 

The reason for the fast convergence of the GGN algorithm is 
intrinsically that the GGN algorithm achieves the convergence 
rate of the GN algorithm, which converges quadratically when 
the noise level is low, while the diffusion algorithm is a first 
order sub-gradient method that converges sub-linearly. The fast 
convergence is partly due to the difference in computation 
complexity, where our algorithm is dominated by the matrix 
inversion on the order of 0{N^), while the diffusion algorithm 
scales like 0{N). This requires the local processor to have 
the capability to maintain such computations on time for each 
exchange. There is literature on reducing the computation cost 
for matrix inversions, but this is beyond the scope of this paper. 



B. Comparison with Centralized GN Approach 

In this simulation, we added random outliers errors with 
variances ei^m[t] = lOa^ on 25 randomly selected measure- 
ments for 6 measurement snapshots. We examine the MSB 
performance of the DARSE scheme where, in each snapshot 
t, each agent talks to another agent 20 times on average during 
the interval [r/^, r/e+i) for all /c = 1, • • • , 20. In this case, the 
network communication volume is on the order of the network 
diameter 0{N), which implies the number of transmissions 
in the centralized scheme as if the local measurements are 
relayed and routed through the entire network. Furthermore, 
we examine the performance of the DARSE for cases with 
random link failures, where any established link {i^j} G A4 
fails with probability p = 0.1 independently. It is clear that 
this communication model with link failures may not satisfy 
Condition [T] |2] & [3] but the numerical results show that our 
approach is robust and degrades gracefully. 

To demonstrate the effectiveness of the DARSE scheme 
with bad data, we compare it with the centralized GN pro- 
cedure with and without bad data, where the situation without 
bad data serves as the ultimate benchmark. Clearly, it can 
be seen from Fig. 



3(a) to 3(c) that DARSE sometimes has 



a certain performance loss compared with the centralized GN 
without bad data. Sometimes DARSE outperforms the cen- 
tralized GN algorithm because the re-weighting numerically 
leads to certain improvement, but this is due to the fact that 
the measurements with greater variance influence less the state 
estimates rather than an intrinsic behavior of the algorithm. On 
the other hand, when bad data are present, the DARSE scheme 
outperforms significantly the centralized GN approach without 
re-weighting, thanks to the bad data suppression. 



VI. Conclusions 

In this paper, we propose a DARSE scheme for hybrid 
power system state estimation integrating seamlessly WAMS 
an SCADA measurement system, which adaptively estimates 
the global state vector along with an updated noise co variance. 
The numerical results show that the DARSE scheme is able 
to deliver accurate estimates of the entire state vector at each 
distributed area, even in the presence of bad data and random 
communication link failures. 
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Appendix A 
Power Flow Equations and Jacobian Matrix 

Each line is characterized by the admittance matrix Y = 
[—ynm]NxN, which includcs Hne admittances Ynm = Gnm + 
iBnm, {n,m} G f, shunt admittances Ynm = Gnm + i^nm 
in the H-model of Hne {n, m} G £, and self-admittance 
Ynn = -Em/n(^nm + >^nm). Using the Canonical basis 
= [0, • • • 7 0]^ and Y, we define the following 

Y — p p^Y Y — (Y + y "ip p^ — y p p^ 

Letting = J?{Y,}, = ^{Y,}, G,^ = K{Y,^} and 
Bnm = ^{Ynm}, wc define the following matrices 



G„ 
Gn 



-B 
G„ 

-B 
G 



nm 





nm 
nm 



B TO, TOO, 



Nq,„ 
^J.nm 



Bn 






Bn 


BnTTi 


G 


Grnm B^ 


Brim 








^nm 



State estimation mainly uses power injection and flow 
measurements from SCADA systems or, if available, PMU 
measurements from WAMS. Traditional SCADA systems ag- 
gregate data from the so called Remote Terminal Units (RTU), 
refreshing the data every Tscada = 2 to 5 seconds and 
collects active/reactive injection (Pn^Qn) at bus n and flow 
{Pnm, Qnm) at bus u on line {n, m} 



Pnm — ^ ^P.nm 



nm — ^ ^nm^ -) 



Pn = V^Np,nV, 
Qn V^Ng^nV, 

and stack them in the power flow equations 

fi(v) = [---,P, ^ 1^ 

f^(v) = [••• ,P, 



^n? J 
7 Qnm-) 



(46) 
(47) 

(48) 
(49) 



The WAMS generate data at a much faster pace compared 
to SCADA systems, with TpMu = 1/120 to 1/30 second. The 
PMU data are gathered at Phasor Data Concentrators (PDC), 
which collects the voltage ^{K,}) at bus n and the 

current (/nm, Jnm) on line {n, m} measured at bus n 

Inm = (I2 ^ en)^ C/,nmV (50) 
Jnm = (I2 ^ ^nY ^J,nm^, (51) 

where (g) is the Kronecker product, and stacks them as 

■ ' * 7 Jnm^ ' ' ' ] • (^^) 

The Jacobian F(v) can be derived from ( [5Q| , ( |46| ) and ( |47| ) 



fv(v) = V, fc(v) = [••• ,/nmr 



F(v) = 



I2JV 
He 

(l2Ar ® v)^H2 



(I 



^2N 

AE ( 



(53) 



where 



N 



H 



■c 



HT 
' J,n^' 



N, 

'Q,nm 
T 



Q,n^ 



using Sn = I^;^ (8) (I2 (g e^) with En being the number of 
incident lines at bus n and 



H/,n — SnC/^n, C/^^ — [' ' ' 7 Cj^^^, ' ' ' ]^ (54) 



Appendix B 
Proof of Lemma 1 

The F-norm inequality || • || < || • ||f gives us 
||F,(v)-F,(vOf <||F.(v)-F,(vO| 



(55) 



for all i. Since F^(v) 



-1/2 



TiF(v), we further use the 



multiplicative norm inequality ||AB||^ < ||A||^||B||^ 



Fi(v)-F,(v') 

From ( |53l ), we have 
F(v) - F(v' 



-1/2 



Ti [F(v) - F(v') 



<A-L(r.)||F(v)-F(v') 



1^ 



02Arx2Af 

04Lx2Ar 
l2Ar ® (V - V')n Hi 
I4L®(V-V')^]H^. 

2 



(56) 
(57) 



(58) 



According to the F-norm definition || A||J, = Tr (A^A) and 
the properties of the trace operator, we have for any v, 

||F(v) - F(v')||^ = Tr [{I2N ® (v - v')(v - v')^) HiHJ] 
+ Tr [(I4L ® (v - v')(v - v')^) H^H^ 



Expanding Hj and Hjr in ( |53| i and using their symmetric 
properties, we have 

||F(v) - F{v')\\l = (v - v')^M(v - v'), (59) 

where M = HjHj + H^Hjr. It is well-known that any 
quadratic form of a symmetric matrix can be bounded as 



(v - v')^M(v 



/) < llMllllv-v'll'. 



(60) 



|M||A-i„(r,). 



The result follows by setting uj = max^ 
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